rm(list=ls())
# Load R package for printing
library(knitr)
Aim
To model and analyze point reference spatial datasets in R by using the R package gstat.
Reading material
Lecture notes and bibliography:
References for R, Rmarkdown, and LaTeX (optional, supplementary material):
New software
We will learn how to use the R package gstat:
gstat: Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation
Datasets involved
Meuse river data set
Source: meuse{sp}
It contains data.frame: meuse (used for training).
Coordinate variables: “x”, “y”
Responses: “cadmium”, “copper”, “lead”, “zinc”
Covariates: “elev”, “dist”, “om”, “ffreq”, “soil”, “lime”, “landuse”
Lecture notes, part 1: Types of spatial data
Jura data set
Source: jura {gstat}
It contains data.frames: jura.pred (used for training), jura.val (used for validation) and jura.grid (used for plotting).
Coordinate variables: “Xloc”, “Yloc”, “long”, “lat”
Responses: “Cd”, “Co”, “Cr”, “Cu”, “Ni”, “Pb”, “Zn”
Covariates: “Landuse” “Rock”
We study how to implement basic point reference modeling and analysis in R computing environment by using the R package gstat (Pebesma, 2004), in conjunction to the R packages sf and sp for handling spatial data objects.
The R package gstat (Pebesma, 2004) provides basic functionality for univariable and multivariable geostatistical modeling, prediction, simulation, and analysis. This includes:
R function grf {geoR} is generates (unconditional) simulations of Gaussian random fields for given covariance parameters.
Important input arguments:
n= number of points (spatial locations) in each simulations.
nx= Number of points in the X direction.
ny= Number of points in the Y direction.
xlims= Limits of the area in the X direction. Defaults to [0,1][0,1].
ylims= Limits of the area in the Y direction. Defaults to [0,1][0,1].
cov.model= correlation function: “matern”, “exponential”, “gaussian”, “spherical”, “circular”, etc…
cov.pars= a vector with 2 elements or an \(n \times 2\) matrix with values of the covariance parameters \(\sigma^2\) (partial sill) and \(\phi\) (range parameter).
nugget= the value of the nugget effect parameter
for more details ?cov.spatial and ?grf.
The following R script simulates \(n=100\) random points. It uses an Exponential covariance function which is set by cov.model = “exponential”, with covariance functions parameters 1 for the partial sill and .25 for the range which is set by cov.pars=. It uses no nugget which is set by nugget = 0. That is
\[ c\left(h\right)=\sigma^{2}\exp\left(-\frac{1}{\phi}\left\Vert h\right\Vert _{1}\right) \]
rm(list=ls())
# simulate
library(geoR)
## --------------------------------------------------------------
## Analysis of Geostatistical Data
## For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
## geoR version 1.9-4 (built on 2024-02-14) is now loaded
## --------------------------------------------------------------
sim.grf <- grf(n = 100, # number of points (spatial locations) in each simulations.
cov.model = "exponential", # matern cov function family
cov.pars = c(1, .25), # a vector with 2 elements with values of the covariance
# parameters \sigma^2(partial sill) and \phi (range parameter)
nugget = 0.0, # the value of the nugget effect parameter
xlims = c(0, 1),
ylims = c(0, 1)
)
## Warning in grf(n = 100, cov.model = "exponential", cov.pars = c(1,
## 0.25), : .Random.seed not initialised. Creating it with by calling runif(1)
## grf: simulation(s) on randomly chosen locations with 100 points
## grf: process with 1 covariance structure(s)
## grf: nugget effect is: tausq= 0
## grf: covariance model 1 is: exponential(sigmasq=1, phi=0.25)
## grf: decomposition algorithm used is: cholesky
## grf: End of simulation procedure. Number of realizations: 1
# create an sf object
sim.grf.df <- data.frame(x = sim.grf$coords[,1],
y = sim.grf$coords[,2],
value = sim.grf$data
)
library(sf)
## Linking to GEOS 3.12.2, GDAL 3.8.5, PROJ 9.3.1; sf_use_s2() is TRUE
sim.grf.sf <- st_as_sf( sim.grf.df,
coords = c("x", "y")
)
# plot
library(ggplot2)
ggplot(data = sim.grf.sf) +
geom_sf(aes(color = value)) +
theme_minimal() +
labs(y= "y", x = "x", title ="GRF")
Task
Simulate and plot a GRF with
\(500\) points,
Gaussian covariance function, with partial sill equal to 2 and the range equal to 1.5
nugget equal to 0.1
Note: At home you may have a look at Algorithm 1
#
# Write your code below.
#
In what follows we work on the geostatistical model specified as a stochastic process \(\left(Z\left(s\right);s\in\mathcal{S}\right)\) with
\[\begin{equation} Z\left(s\right)=\mu\left(s\right)+\delta\left(s\right)\label{eq:-39} \end{equation}\]
where \(\mu\left(s\right)\) is a deterministic linear expansion of known basis functions \(\left\{ \psi_{j}\left(\cdot\right)\right\} _{j=0}^{p}\) and unknown coefficients \(\left\{ \beta_{j}\right\} _{j=0}^{p}\) such as
\[\begin{align*} \mu\left(s\right) & =\sum_{j=0}^{p}\psi_{j}\left(s\right)\beta_{j}=\left(\psi\left(s\right)\right)^{\top}\beta \end{align*}\]
with \(\beta=\left(\beta_{0},...,\beta_{p}\right)^{\top}\) and \(\psi\left(s\right)=\left(\psi_{0}\left(s\right),...,\psi_{p}\left(s\right)\right)^{\top}\).
Also , \(\delta\left(s\right)\) is a zero mean intrinsic random field with semivariogram \(\gamma(\cdot)\).
The R function variogram {gstat} calculates the sample semivariogram cloud and the sample semivariogram from data \(Z(\cdot)\).
In case a linear drift is given, it calculates the above for the residuals \(\delta(\cdot)=Z(\cdot)-\mu(\cdot)\), with options for isotropic / anysotropic semivariogram, and for irregular / regular distance intervals.
Important input arguments
data= data frame where the names in formula are to be found
Outputs
variogram {gstat} returns an object of class “gstatVariogram” that includes:
np: the number of point pairs for this estimate
dist: the average distance of all point pairs considered for this estimate
gamma: the actual sample variogram estimate
The following R script computes and plots the semivariogram cloud corresponding to \(\delta(\cdot)\). It
loads the data set meuse{sp}
computes the semivariogram cloud by considering a drift \(\mu(s)=\beta_0+\beta_1\sqrt{d(s)}\) where \(d(s)\) denotes the covariate “dist” from the dataset meuse{sp}.
plots the semivariogram cloud
rm(list = ls())
library(sp)
library(sf)
# 1 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df,
coords = c("x", "y"), # specify which columns are the coordinates
crs = 28992 # CRS code
)
# 2 Compute the sample variogram cloud
library(gstat)
meuse.sv.cld <- variogram(object = log(zinc) ~ 1 + sqrt(dist), # formula defining the response vector and (possible) regressors
data = meuse.sf, # data frame
cloud = TRUE # if TRUE, it calculates the semivariogram cloud
)
# ...with some investigation
names(meuse.sv.cld)
## [1] "np" "dist" "gamma" "dir.hor" "dir.ver" "id"
head(meuse.sv.cld)
# 3 Plot
plot(meuse.sv.cld,
xlab = 'distance',
ylab = 'gamma',
main = 'semivariogram cloud'
)
# or
# plot(x = cld$dist,
# y = cld$gamma,
# cex = .5,
# pch = 3,
# col = 'blue',
# xlab = 'distance',
# ylab = 'gamma',
# main = 'semivariogram cloud'
# )
Task
Consider the dataset in object jura.pred provided by jura {gstat} . The quantity of interest (response) is Cd representing mg cadmium \(kg^{-1}\) topsoil but in log-scale. Consider a constant deterministic drift function \(\mu(.)=\beta_0\) where \(\beta_0\) is unknown. Compute and plot the the semivariogram cloud.
rm(list=ls())
# Load the data
library(sp)
library(sf)
library(gstat)
data(jura)
jura.pred.df <- as.data.frame(jura.pred)
jura.pred.sf <- st_as_sf(jura.pred.df,
coords = c("long", "lat"), # specify which columns are the coordinates
crs = 4326 # CRS code
)
#
# Write your code below
#
The following R script computes and plots the non-parametric Matheron’s semi-variogram estimate corresponding to \(\delta(\cdot)\).
For this, function variogram {gstat} uses the input argument cloud=FALSE. It
loads the data set meuse{sp}
computes the sample semivariogram by considering a drift \(\mu(s)=\beta_0+\beta_1\sqrt{d(s)}\) where \(d(s)\) denotes the covariate “dist” from the dataset meuse{sp}.
plots the semivariogram cloud
rm(list = ls())
library(sp)
library(sf)
# 1 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df,
coords = c("x", "y"), # specify which columns are the coordinates
crs = 28992 # CRS code
)
svgm <- variogram(object = log(zinc) ~ 1 + sqrt(dist), # formula defining the response vector and (possible) regressors
data = meuse.sf, # data frame
cloud=FALSE
)
#
summary(svgm)
## np dist gamma dir.hor dir.ver
## Min. : 57.0 Min. : 79.29 Min. :0.0882 Min. :0 Min. :0
## 1st Qu.:435.5 1st Qu.: 425.61 1st Qu.:0.1693 1st Qu.:0 1st Qu.:0
## Median :477.0 Median : 796.18 Median :0.1930 Median :0 Median :0
## Mean :458.9 Mean : 799.98 Mean :0.1918 Mean :0 Mean :0
## 3rd Qu.:545.0 3rd Qu.:1169.60 3rd Qu.:0.2315 3rd Qu.:0 3rd Qu.:0
## Max. :589.0 Max. :1543.20 Max. :0.2550 Max. :0 Max. :0
## id
## var1:15
##
##
##
##
##
#
plot(x = svgm$dist,
y = svgm$gamma,
cex = .5,
pch = 3,
col = 'blue',
xlab = 'distance',
ylab = 'gamma',
main = 'sample variogram'
)
#
text(svgm$dist,
svgm$gamma,
svgm$np,
adj = c(0,2)
)
Function variogram {gstat} computes the non-parametric Matheron’s semi-variograme estimate for specific angles by using the additional input argument
Furthermore, function variogram {gstat} allows the use of other input arguments such as
cutoff indicating spatial separation distance up to which point pairs are included in semivariance estimates
width indicating the width of subsequent distance intervals into which data point pairs are grouped for semivariance estimates
The following script computes the semi-variogram estimate for
angle equal to \(45\) degrees,
cutoff distance equal to \(1000\), and
width equal to \(50\).
rm(list = ls())
library(sp)
library(sf)
# 1 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df,
coords = c("x", "y"), # specify which columns are the coordinates
crs = 28992 # CRS code
)
# 2 Compute the semivariogram estimate
svgm.tmp <- variogram(object = log(zinc) ~ 1 + sqrt(dist), # formula defining the response vector and (possible) regressors
data = meuse.sf, # data frame
alpha = c(45), # direction in plane (x,y),
cutoff = 1000, # spatial separation distance up to which point pairs are included in semivariance estimates
width = 50 # the width of subsequent distance intervals into which data point pairs are grouped for semivariance estimates
)
# plot
plot( svgm.tmp )
Task
Consider the dataset in object jura.pred provided by jura {gstat} . The quantity of interest (response) is Cd representing mg cadmium \(kg^{-1}\) topsoil.
Consider a constant deterministic drift function \(\mu(.)=\beta_0\) where \(\beta_0\) is unknown.
Compute and plot the the sample semivariogram cloud with angle equal to \(0\), \(45\), \(60\), and \(90\) degrees, cutoff distance equal to \(1.5\), and width equal to \(0.1\)
rm(list=ls())
# Load the data
library(sp)
library(sf)
library(gstat)
data(jura)
jura.pred.df <- as.data.frame(jura.pred)
jura.pred.sf <- st_as_sf(jura.pred.df,
coords = c("long", "lat"), # specify which columns are the coordinates
crs = 4326 # CRS code
)
#
# Write your code below
#
This is tricky a bit, the main steps are as follows
compute a sample semigariogram given the data by using variogram{gstats} as previously
specify the parametric form of the semivariogram model by using vgm{gstats}
calibrate / estimate / fit the specified parametric semivariogram against the computed sample semivariogram by using fit.variogram{gstats}
The function vgm{gstats} specifies / generates a semivariogram model, or adds a semivariogram model to an existing model with purpose to create a mode general one.
Important input arguments of vgm{gstats} are:
psill (partial) sill of the variogram model component, or model
model semivariogram model type, (e.g. “Exp”, “Sph”, “Gau”, or “Mat”).
range range parameter of the variogram model component; in case of anisotropy: major range / direction of the ellipsoid (in 2D case)
nugget nugget component of the variogram (this basically adds a nugget compontent to the model); if missing, nugget component is omitted
add.to the variogram model to which we want to add a component (structure)
cutoff maximum distance up to which variogram values are computed
kappa smoothness parameter for the Matern class of variogram models
Here is a visual
The basic parametric semivariogram models available in vgm{gstats} are
Examples of the basic parametric semivariogram models using specific parametric values are:
The following script specifies parametric semivariogram model resulting by adding several standard semivariogram models:
\[ \gamma\left(h\right)=\gamma_{\text{nug}}\left(h;\sigma^{2}=0.5\right)+\gamma_{\text{sph}}\left(h;\sigma^{2}=1,\phi=300\right)+\gamma_{\text{sph}}\left(h;\sigma^{2}=0.8,\phi=800\right) \]
where \(\gamma_{\text{nug}}\left(h;\sigma^{2}\right)=\sigma^{2}\text{1}_{\{0\}}\left(\left\Vert h\right\Vert \right)\) is the nugget semivariogram model.
rm(list = ls())
# 1 Specify the semivariogram model
vgm.tmp <- vgm(psill = 0.5,
model = "Nug",
range = 0
)
vgm.tmp <- vgm(psill = 1,
model = "Sph",
range = 300,
add.to = vgm.tmp
)
vgm.tmp <- vgm(psill = 0.8,
model = "Sph",
range = 800,
add.to = vgm.tmp
)
# 2 print the values of the parameters of the specified semivariogram model
print(vgm.tmp)
## model psill range
## 1 Nug 0.5 0
## 2 Sph 1.0 300
## 3 Sph 0.8 800
The function fit.variogram {gstat} fits ranges and/or sills from a simple or nested variogram model to a sample variogram.
Important input arguments:
object= gets the sample variogram, output of variogram
model= variogram model, output of vgm
The following script fits the semivariogram model
\[ \gamma\left(h\right) =\gamma_{\text{nug}}\left(h;\sigma_{\text{nug}}^{2} \right) +\gamma_{\text{sph}}\left(h;\sigma_{\text{sph}}^{2},\phi_{\text{sph}}\right) \]
(aka estimates \(\sigma_{\text{nug}}^{2}\), \(\sigma_{\text{sph}}^{2}\), and \(\phi_{\text{sph}}\)) against the sample variogram via weighted least squares. Then it prints the result.
Note that the values in the input arguments related to the semivariogram parameters psill = 0.5, psill = 1, and range = 300 are just initial seed values for the numerical solver used.
rm(list = ls())
library(sp)
library(sf)
# 0 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df,
coords = c("x", "y"), # specify which columns are the coordinates
crs = 28992 # CRS code
)
# 1 Specify the semi variogram model
smpl.vgm <- variogram(object = log(zinc) ~ 1 + sqrt(dist), # formula associated to the trend
data = meuse.sf, # data frame
)
vgm.obj <- vgm(psill = 0.5,
model = "Nug",
range = 0
)
vgm.obj <- vgm(psill = 1, # the value "=1" is just an initial value
model = "Sph",
range = 300,
add.to = vgm.obj
)
# 2 Fit the semi variogram model against the sample variogram
est.vgm <- fit.variogram(object = smpl.vgm, # sample variogram, output of variogram
model = vgm.obj # variogram model, output of vgm
)
# 3 Print the estimated values for the parameters of the semi variogram model
print(est.vgm)
## model psill range
## 1 Nug 0.07981775 0.000
## 2 Sph 0.14905741 872.719
Task
Consider the dataset in object jura.pred provided by jura {gstat} . The quantity of interest (response) is Cd representing mg cadmium \(kg^{-1}\) topsoil.
Consider a constant deterministic drift function \(\mu(.)=\beta_0\) where \(\beta_0\) is unknown.
Fit the parametric family of semivariograms
\[ \gamma\left(h\right)=\gamma_{\text{nug}}\left(h;\sigma_{\text{nug}}^{2}\right)+\gamma_{\text{Exp}}\left(h;\sigma_{\text{Exp}}^{2},\phi_{\text{Exp}}\right) \]
where \(\gamma_{\text{nug}}\left(h;\sigma^{2}\right)=\sigma^{2}\text{1}_{\{0\}}\left(\left\Vert h\right\Vert \right)\) is the nugget semivariogram model. Use suitable initial values for \(\sigma_{\text{nug}}^{2}\), \(\sigma_{\text{Exp}}^{2}\), \(\phi_{\text{Exp}}\)
Consider the sample semivariogram with cutoff distance equal to \(1.5\), and width equal to \(0.1\). Also assume isotropy.
Print the semivariogram plot.
rm(list=ls())
# Load the data
library(sp)
library(sf)
library(gstat)
data(jura)
jura.pred.df <- as.data.frame(jura.pred)
jura.pred.sf <- st_as_sf(jura.pred.df,
coords = c("long", "lat"), # specify which columns are the coordinates
crs = 4326 # CRS code
)
#
# Write your code below
#
The main steps for Universal and simple Kriging are the following
Specify the semivariogram model and fit it against the sample semivariogram estimate by using vgm{gstats}, variogram{gstats}, and fit.variogram{gstats} as previously.
Train the geostatistical model by using gstat {gstat}
Function gstat{gstat} (has similar structure to lm{stats}) produces objects that hold all the information necessary for univariate geostatistical prediction (simple, ordinary or universal kriging).
Important input arguments:
formula= formula that defines the dependent variable as a linear model of independent variables
data= data frame; contains the dependent variable, independent variables, and locations.
model= semivariogram model
beta= for simple kriging the vector with the trend coefficients (including intercept); for ordinary and universal kriging it is omitted.
locations formula with only independent variables that define the spatial data locations (coordinates), e.g. ~x+y;
The following script trains the geostatistical model
\[ Z\left(s\right)=\mu\left(s\right)+\delta\left(s\right) \]
against the meuse{sp} dataset.
The deterministic trend is
\[ \mu\left(s\right)=\beta_{0}+\beta_{1}\underset{=\psi_{1}\left(s\right)}{\underbrace{\sqrt{D\left(s\right)}}} \]
where \(D\left(s\right)\) denotes the variable meuse$distance, \(\beta\) is unknown, and \(\delta\left(s\right)\) is an intrinsic stationary process with variogram \[ \gamma\left(h\right) =\gamma_{\text{nug}}\left(h;\sigma_{\text{nug}}^{2} \right) +\gamma_{\text{sph}}\left(h;\sigma_{\text{sph}}^{2},\phi_{\text{sph}}\right) \]
rm(list = ls())
library(sp)
library(sf)
# 1 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df,
coords = c("x", "y"), # specify which columns are the coordinates
crs = 28992 # CRS code
)
# 2 Compute the semivariogram estimate
svgm.tmp <- variogram(object = log(zinc) ~ 1 + sqrt(dist), # formula defining the response vector and (possible) regressors
data = meuse.sf, # data frame
alpha = c(45), # direction in plane (x,y),
cutoff = 1000, # spatial separation distance up to which point pairs are included in semivariance estimates
width = 50 # the width of subsequent distance intervals into which data point pairs are grouped for semivariance estimates
)
# set the formula of the ternd
#
frml <- log(zinc) ~ 1 + sqrt(dist)
#
# Estimate the variogram
#
smpl.vgm <- variogram(object = frml,
data = meuse.sf)
#
par.vgm.0 <- vgm(psill = 0.5,
model = "Nug",
range = 0
)
par.vgm.1 <- vgm(psill = 1,
model = "Sph",
range = 300,
add.to = par.vgm.0
)
par.vgm.est <- fit.variogram(object = smpl.vgm,
model = par.vgm.1
)
#
# compute the Kriging equations
#
uni.krige.est <- gstat( formula = frml,
data = meuse.sf,
model = par.vgm.est
)
#
#
# summary
#
summary(uni.krige.est)
## Length Class Mode
## data 1 -none- list
## model 1 -none- list
## call 4 -none- call
R function predict {gstat} provides prediction methods for simple, ordinary, and universal kriging, point- or block-kriging.
Important input arguments
object object of class gstat
newdata data frame with prediction/simulation locations; should contain columns with the independent variables (if present) and the coordinates with names as defined in the data argument of used for training with gstat.
The following script
computes the predictive mean and variance for the geostatistical model at unseen locations included in the data.frame meuse.grid {sp} Note that we have to convert the data.frame meuse.grid to sf object having coordinates and covariates with the same namce as the meuse data we used for training.
prints the predictions
plots the variance of the predictions
#rm(list=ls())
# 0 Get the data => from previous script
# 1 Train the geostatistical model => from the previous script
# 2.1 Get the unseen locations
data(meuse.grid)
meuse.grid.df <- meuse.grid
meuse.grid.sf <- st_as_sf(meuse.grid.df,
coords = c("x", "y"), # specify which columns are the coordinates
crs = 28992 # CRS code
)
# 2.1 Compute the predictions at the unseen locations
uni.krige.prd <- predict(object = uni.krige.est, # object of class gstat; output of gstat()
newdata = meuse.grid.sf # data frame with prediction locations
)
## [using universal kriging]
# 2.2 Print summaries
print(uni.krige.prd)
## Simple feature collection with 3103 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 178460 ymin: 329620 xmax: 181540 ymax: 333740
## Projected CRS: Amersfoort / RD New
## First 10 features:
## var1.pred var1.var geometry
## 1 7.071055 0.1683803 POINT (181180 333740)
## 2 7.088269 0.1518497 POINT (181140 333700)
## 3 6.785167 0.1537665 POINT (181180 333700)
## 4 6.512937 0.1576136 POINT (181220 333700)
## 5 7.104999 0.1343482 POINT (181100 333660)
## 6 6.804171 0.1368076 POINT (181140 333660)
## 7 6.572142 0.1414102 POINT (181180 333660)
## 8 6.420664 0.1469311 POINT (181220 333660)
## 9 7.023982 0.1177907 POINT (181060 333620)
## 10 6.820917 0.1196876 POINT (181100 333620)
summary(uni.krige.prd)
## var1.pred var1.var geometry
## Min. :4.455 Min. :0.1009 POINT :3103
## 1st Qu.:5.212 1st Qu.:0.1160 epsg:28992 : 0
## Median :5.590 Median :0.1231 +proj=ster...: 0
## Mean :5.702 Mean :0.1299
## 3rd Qu.:6.134 3rd Qu.:0.1376
## Max. :7.477 Max. :0.2112
# 2.4 Plot predictions
library(mapview)
mapview(uni.krige.prd,
zcol= "var1.pred",
layer.name = "predicted mean"
)
mapview(uni.krige.prd,
zcol= "var1.var",
layer.name = "predicted varaince"
)
Task (Simple Kriging)
Consider the dataset in object jura.pred provided by jura {gstat} . The quantity of interest (response) is Cd representing mg cadmium \(kg^{-1}\) topsoil.
Consider the same above geostatistical model with difference that
the deteministic trend is known and equal to
\[ \mu\left(s\right)=\beta_{0}=0.1 \]
where \(\beta_{0}=1\) is a unknown constant –this is simple Kriging.
the semivariogram is parameterised as
\[ \gamma\left(h\right)=\gamma_{\text{nug}}\left(h;\sigma_{\text{nug}}^{2}\right)+\gamma_{\text{Exp}}\left(h;\sigma_{\text{Exp}}^{2},\phi_{\text{Exp}}\right) \]
Compute the predictive mean and variance at the unseen locations provided by the dataset in the obj jura.grid in jura {gstat} .
rm(list=ls())
# Load the data
library(sp)
library(sf)
library(gstat)
data(jura)
jura.pred.df <- as.data.frame(jura.pred)
jura.pred.sf <- st_as_sf(jura.pred.df,
coords = c("long", "lat"), # specify which columns are the coordinates
crs = 4326 # CRS code
)
jura.grid.df <- as.data.frame(jura.grid)
##################
#
# Write your code
#
##################
Cross validation splits the dataset into two sets: a modeling set and a validation set.
The modeling set is used for semivariogram modelling and kriging on the locations of the validation set, and then the validation measurements can be compared to their predictions.
If all went well, cross validation residuals or z-scores computed as \[ z_i = \frac { Z(s_i)-Z_{[i]}(s_i) } { \sigma_{[i]}(s_i) }, i=1,...,n \]
with \(Z_{[i]}(s_i)\) denoting the cross validation prediction for \(s_i\) and \(\sigma_{[i]}(s_i)\) denoting the corresponding kriging standard error are small, have zero mean, and no apparent structure.
Function gstat.cv{gstat} performs cross validation for simple, ordinary or universal point (co)kriging.
Important arguments
object= object of class gstat;
nfold= if set to e.g. 5, five-fold cross validation is done
The following script
this code is not provided. Write down your own code.
use a bubble plot bubble {sp} with arguments obj=
rm(list = ls())
library(sp)
library(sf)
# 0 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df,
coords = c("x", "y"), # specify which columns are the coordinates
crs = 28992 # CRS code
)
# 1 train the geostatistical model
# set the formula of the ternd
frml <- log(zinc) ~ 1 + sqrt(dist)
# Estimate the variogram
smpl.vgm <- variogram(object = frml,
data = meuse.sf)
#
par.vgm.0 <- vgm(psill = 0.5,
model = "Nug",
range = 0
)
par.vgm.1 <- vgm(psill = 1,
model = "Sph",
range = 300,
add.to = par.vgm.0
)
par.vgm.est <- fit.variogram(object = smpl.vgm,
model = par.vgm.1
)
# Train the model
krige.fit <- gstat( formula = frml,
data = meuse.sf,
model = par.vgm.est
)
# 2 Perform n-fold Cross Validation
nf.cv = gstat.cv(krige.fit, # object of class gstat
nfold = 5
)
## [using universal kriging]
## [using universal kriging]
## [using universal kriging]
## [using universal kriging]
## [using universal kriging]
# 3 Plot the z-scores
bubble(obj = nf.cv["zscore"])
Task
Continuing your previous task, where you trained the geostatistical model with difference that
the deteministic trend is known and equal to
\[ \mu\left(s\right)=\beta_{0}=0.1 \]
where \(\beta_{0}=1\) is a unknown constant –this is simple Kriging.
the semivariogram is parameterised as
\[ \gamma\left(h\right)=\gamma_{\text{nug}}\left(h;\sigma_{\text{nug}}^{2}\right)+\gamma_{\text{Exp}}\left(h;\sigma_{\text{Exp}}^{2},\phi_{\text{Exp}}\right) \]
against the dataset in object jura.pred provided by jura {gstat} . The quantity of interest (response) is Cd representing mg cadmium \(kg^{-1}\) topsoil.
Perform a 2-fold cross validation and plot the z-scores.
Is the fitting satisfactory?
rm(list=ls())
# Load the data
library(sp)
library(sf)
library(gstat)
data(jura)
jura.pred.df <- as.data.frame(jura.pred)
jura.pred.sf <- st_as_sf(jura.pred.df,
coords = c("long", "lat"), # specify which columns are the coordinates
crs = 4326 # CRS code
)
jura.grid.df <- as.data.frame(jura.grid)
jura.grid.sf <- st_as_sf(jura.grid.df,
coords = c("long", "lat"), # specify which columns are the coordinates
crs = 4326 # CRS code
)
#
# set the formula of the ternd
#
frml <- log(Cd) ~ 1
#
# Estimate the variogram
#
smpl.vgm <- variogram(object = frml,
data = jura.pred.sf)
#
par.vgm.model <- vgm(psill = 0.1,
model = "Nug",
range = 0.0
)
par.vgm.model <- vgm(psill = 0.5,
model = "Exp",
range = 1.5,
add.to = par.vgm.model
)
par.vgm.est <- fit.variogram(object = smpl.vgm,
model = par.vgm.model
)
#
# compute the Kriging equations
#
sim.krige.est <- gstat( formula = frml,
data = jura.pred.sf,
model = par.vgm.est,
beta = c(0.1) #for simple kriging, vector with the trend coefficients (including intercept)
)
##################
#
# Write your code
#
##################
Function predict {gstat} can also perform block kriging prediction.
Additional important input arguments:
Some examples:
For regular blocks, by specifying a block size
E.g. for blocks of size \(50 \times 50\)
block=c(50,50)
For irregular but constant ‘blocks’, by specifying points that discretise the irregular form
E.g. for blocks of circular shape with radius \(20\)
xy <- expand.grid(x = seq(-20, 20, 4), y = seq(-20, 20,4)) xy <- xy[(xy\(x^2 + xy\)y^2) <= 20^2, ]
The following script computes and prints block Kriging prediction (mean and variance) for blocks of size \(50 \times 50\) given the spatial locations meuse.grid {sp}
rm(list = ls())
library(sp)
library(sf)
# 1 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df,
coords = c("x", "y"), # specify which columns are the coordinates
crs = 28992 # CRS code
)
#
frml <- log(zinc) ~ 1 + sqrt(dist)
#
smpl.vgm <- variogram(object = frml,
data = meuse.sf)
#
par.vgm.0 <- vgm(psill = 0.5,
model = "Nug",
range = 0
)
par.vgm.1 <- vgm(psill = 1,
model = "Sph",
range = 300,
add.to = par.vgm.0
)
par.vgm.est <- fit.variogram(object = smpl.vgm,
model = par.vgm.1
)
#
krige.est <- gstat( formula = frml,
locations = meuse.sf,
model = par.vgm.est
)
data(meuse.grid)
meuse.grid.df <- meuse.grid
meuse.grid.sf <- st_as_sf(meuse.grid.df,
coords = c("x", "y"), # specify which columns are the coordinates
crs = 28992 # CRS code
)
#
krige.prd <- predict(object = krige.est, # object of class gstat; output of gstat()
newdata = meuse.grid.sf, # data frame with prediction locations
block = c(50, 50)
)
## [using universal kriging]
#
mapview(krige.prd,
zcol= "var1.pred",
layer.name = "predicted mean"
)
mapview(krige.prd,
zcol= "var1.var",
layer.name = "predicted varaince"
)
Task
Compute and plot the block Kriging prediction (mean and variance) for blocks of circular shape with radius \(20\), centered on the points of meuse.grid {sp}
#######################
# Write your code below
#######################
This is the case where multiple dependent spatial variables are analyzed jointly.
From the same dataset meuse{sp} consider response variables (in log scale)
cadmium: topsoil cadmium concentration, mg kg-1 soil (“ppm”)
copper: topsoil copper concentration, mg kg-1 soil (“ppm”)
lead: topsoil lead concentration, mg kg-1 soil (“ppm”)
zinc: topsoil zinc concentration, mg kg-1 soil (“ppm”)
We need to start organizing the available information.
The organization is performed by sequentially using the function gstat{gstat} in the following manner
rm(list = ls())
library(sp)
library(sf)
# 0 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df,
coords = c("x", "y"), # specify which columns are the coordinates
crs = 28992 # CRS code
)
# 1 organize the multivarioate response
gstat.obj <- NULL
gstat.obj <- gstat(g = gstat.obj,
id = "logCA",
formula = log(cadmium) ~ 1,
data = meuse.sf
)
gstat.obj <- gstat(g = gstat.obj,
id = "logCO",
formula = log(copper) ~ 1,
data = meuse.sf
)
gstat.obj <- gstat(g = gstat.obj,
id = "logLE",
formula = log(lead) ~ 1,
data = meuse.sf
)
gstat.obj <- gstat(g = gstat.obj,
id = "logZI",
formula = log(zinc) ~ 1,
data = meuse.sf
)
gstat.obj
## data:
## logCA : formula = log(cadmium)`~`1 ; data dim = 155 x 12
## logCO : formula = log(copper)`~`1 ; data dim = 155 x 12
## logLE : formula = log(lead)`~`1 ; data dim = 155 x 12
## logZI : formula = log(zinc)`~`1 ; data dim = 155 x 12
The following script computes the sample semivariogram and plots it
smpl.svg <- variogram(object = gstat.obj)
plot(smpl.svg)
Function fit.lmc{gstast} fits a Linear Model of Coregionalization to a Multivariable Sample Variogram.
Important input arguments
v= multivariable sample variogram, output of variogram
g= gstat object, output of gstat
model= variogram model, output of vgm; if supplied this value is used as initial value for each fit
The following script fits the parametric cross semivariogram under the Linear Model of Coregionalization. As parametric semivariogram model consider the sum of Exponential semivariogram and a nugget.
vm.fit <- gstat::fit.lmc(v = smpl.svg, # multivariable sample variogram
g = gstat.obj, # gstat object, output of gstat
model = vgm(1, "Sph", 800, 1) # variogram model, output of vgm
)
Task
Plot the sample and parametric cross semivariogram in the same plot by using the function plot(vm, vm.fit)
##################
#
# Write your code
#
##################
Task
Compute cokrigging predictions (mean and variance) of all the response variables at locations in muse.grid{sp}, and plot them.
Hints:
use functions predict{gstat}
use spplot{gstat} with arguments obj= and zcol=c(“logCA.pred”,“logCO.pred”,… to plot the predictions
use spplot.vcov{gstat} with arguments x= to plot the variances and covariances.
##################
#
# Write your code
#
##################